Distributed job manager for stateful microservices

ABSTRACT

Two servers implemented as containerized applications may manage the storage of data and the access of that data by compute jobs in a distributed system. A metadata server may distribute data on ingress and assign files to particular storage volumes. The metadata server may then provide a lookup function for files and be configured to distribute a file to other volumes when necessary. A job server may launch jobs as containerized applications and coordinate data access across jobs,

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is a continuation-in-part of, and claims priorityto, U.S. patent application Ser. No 15/907,181 (Attorney Docket No.PRTWP011), titled “Hyper-Convergence with Scheduler Extensions forSoftware-Defined Container Storage Solutions”, by Israni et al., filedFeb. 27, 2018, which is hereby incorporated by reference in its entiretyand for all purposes.

TECHNICAL FIELD

The present disclosure relates generally to containerized applicationsand more specifically to containerized scalable storage applications.

DESCRIPTION OF RELATED ART

Many computing jobs, such as many of those related to machine learningand Tensor Flow, perform functions on some data stored in a data lake.In conventional systems, such data is often stored in a shared siloedinfrastructure which is accessed remotely by all the jobs. For example,physical disks across many machines may be aggregated into a singlelogical volume. Then, an entity such as a lock manager maintains datasynchronization across the shared storage solution.

Computing tasks of this type typically involve three phases. In thefirst phase, data is ingested and stored in the data lake. In the secondphase, a number of distributed jobs are created. Some portions of thedata are typically accessed by only a single job, while other portionsof the data are typically accessed by any jobs. In the third phase,result sets created by the jobs are aggregated and employed for a tasksuch as inference. Frequently, the data itself serves as an input forthe jobs but is not modified by the jobs. For example, the data may beused as a training set for a machine learning computation process.

One of the most difficult challenges facing software developers isinteroperability of software between different computing environments.Software written to run in one operating system typically will not runwithout modification in a different operating system. Even within thesame operating system, a program may rely on other programs in order tofunction. Each of these dependencies may or may not be available on anygiven system, or may be available but in a version different from theversion originally relied upon. Thus, dependency relationships furthercomplicate efforts to create software capable of running in differentenvironments.

In recent years, the introduction of operating-system-levelvirtualization has facilitated the development of containerized softwareapplications. A system configured with operating-system-levelvirtualization includes a container engine that operates on top of theoperating system. Importantly, the container engine is configured tooperate interchangeably in different environments (e.g., with differentoperating systems). At the same time, the container engine is configuredto present a standardized interface to one or more software containers.

Each software container may include computer programming code forperforming one or more tasks. Examples of software containers includeweb servers, email servers, web applications, and other such programs.Each software container may include some or all of the softwareresources that the software in the container needs in order to function.For example, if a software container includes a web application writtenin the Python programming language, the software container may alsoinclude the Python programming language modules that the web applicationrelies upon. In this way, the software container may be installed andmay execute successfully in different computing environments as long asthe environment includes a container engine.

SUMMARY

The following presents a simplified summary of the disclosure in orderto provide a basic understanding of certain embodiments of theinvention. This summary is not an extensive overview of the disclosureand it does not identify key/critical elements of the invention ordelineate the scope of the invention. Its sole purpose is to presentsome concepts disclosed herein in a simplified form as a prelude to themore detailed description that is presented later.

Various embodiments of the present invention relate generally todevices, systems, and methods for managing distributed statementmicroservice jobs. According to various embodiments, a system mayinclude a plurality of computing nodes that each include a respectiveprocessor, a respective memory module, and a respective communicationsinterface. Each computing node may be configured to execute a computejob upon request and/or may include a storage interface configured tocommunicate with a respective virtual storage volume. A system may alsoinclude a metadata server configured to distribute a plurality of filesamong the respective virtual storage volumes and to identify uponrequest the virtual storage volume associated with a designated one ofthe files. A system may also include a job server configured to initiatea respective one or more compute jobs on each of the plurality ofcomputing nodes. Each compute job may access one or more files stored onthe respective virtual storage volume associated with the respectivecomputing node on which the respective compute job is initiated.

These and other embodiments are described further below with referenceto the figures.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure may best be understood by reference to the followingdescription taken in conjunction with the accompanying drawings, whichillustrate particular embodiments.

FIG. 1 illustrates an example of a storage container node, configured inaccordance with one or more embodiments.

FIG. 2 illustrates an example of an arrangement of components in acontainerized storage system, configured in accordance with one or moreembodiments.

FIG. 3 illustrates an example of a method for controlling the loading ofa container on a containerized application node, performed in accordancewith one or more embodiments.

FIG. 4 illustrates an example of a method for prioritizing containerloading, performed in accordance with one or more embodiments.

FIG. 5 illustrates an example of a server, configured in accordance oneor more embodiments.

FIG. 6 illustrates an example of a configuration of nodes, provided inaccordance with one or more embodiments.

FIG. 7 illustrates an example of a configuration of nodes, provided inaccordance with one or more embodiments.

FIG. 8 illustrates an example of an alternate method for containerloading, performed in accordance with one or more embodiments.

FIG. 9 illustrates an example of a method for loading data in adistributed storage system, performed in accordance with one or moreembodiments.

FIG. 10 illustrates an example of a method for executing one or morejobs, performed in accordance with one or more embodiments.

FIG. 11 illustrates an example of a method for terminating one or morejobs, performed in accordance with one or more embodiments.

DESCRIPTION OF EXAMPLE EMBODIMENTS

Reference will now be made in detail to some specific examples of theinvention including the best modes contemplated by the inventors forcarrying out the invention. Examples of these specific embodiments areillustrated in the accompanying drawings. While the invention isdescribed in conjunction with these specific embodiments, it will beunderstood that it is not intended to limit the invention to thedescribed embodiments. On the contrary, it is intended to coveralternatives, modifications, and equivalents as may be included withinthe spirit and scope of the invention as defined by the appended claims.

For example, the techniques of the present invention will be describedin the context of particular containerized storage environments.However, it should be noted that the techniques of the present inventionapply to a wide variety of different containerized storage environments.In the following description, numerous specific details are set forth inorder to provide a thorough understanding of the present invention.Particular example embodiments of the present invention may beimplemented without some or all of these specific details. In otherinstances, well known process operations have not been described indetail in order not to unnecessarily obscure the present invention.

Various techniques and mechanisms of the present invention willsometimes be described in singular form for clarity. However, it shouldbe noted that some embodiments include multiple iterations of atechnique or multiple instantiations of a mechanism unless notedotherwise. For example, a system uses a processor in a variety ofcontexts. However, it will be appreciated that a system can use multipleprocessors while remaining within the scope of the present inventionunless otherwise noted. Furthermore, the techniques and mechanisms ofthe present invention will sometimes describe a connection between twoentities. It should be noted that a connection between two entities doesnot necessarily mean a direct, unimpeded connection, as a variety ofother entities may reside between the two entities. For example, aprocessor may be connected to memory, but it will be appreciated that avariety of bridges and controllers may reside between the processor andmemory. Consequently, a connection does not necessarily mean a direct,unimpeded connection unless otherwise noted.

Example Embodiments

Many computing jobs, such as many of those related to machine learning,perform functions on some data stored in a data lake. In conventionalsystems, such data is often stored in a shared siloed infrastructurewhich is accessed remotely by all the jobs. However, such aconfiguration typically imposes several drawbacks. First, accessingstorage shared across many machines results in substantially slowerperformance. Second, for shared storage to maintain datasynchronization, an entity such as a lock manager is needed forcoordination purposes. Such an entity frequently creates a bottleneckfor data access by jobs. Third, such a configuration often involves asingle point of failure. For example, if the coordination entity thatprovides access to the distributed storage fails, then all jobs will beaffected. Fourth, if storage is shared across many nodes, then aparticular job running on a local node will typically be retrieving datafrom a remote node rather than from the local node, thus imposing asubstantial reduction in performance and a substantial increase incommunications traffic between nodes.

Conventional approaches do not provide for an architecture that supportsfile distribution at the tile level with jobs taking advantage of thedistribution to run more efficiently. In contrast, techniques andmechanisms described herein provide for consistent and efficient filedistribution over a distributed cluster, with jobs being configured toquery file locations and employ a portion of the storage rather thanrelying on the entire namespace.

According to various embodiments, techniques and mechanisms describedherein provide for an architecture in which two servers, implemented ascontainerized applications, manage the storage of data and the access ofthat data by compute jobs in a distributed system. A metadata server maydistribute data on ingress and assign files to particular storagevolumes. The metadata server may then provide a lookup function forfiles and be configured to distribute a file to other volumes whennecessary. Techniques and mechanisms related to the functioning of themetadata server are described throughout the application, and morespecifically with respect to FIG. 9. A job server may launch jobs ascontainerized applications and coordinate data access across jobs.Techniques and mechanisms related to the functioning of the job serverare described throughout the application, and more specifically withrespect to FIG. 10 and FIG. 11.

In some various embodiments, techniques and mechanisms described hereinmay provide any or all of several advantages in comparison toconventional approaches. First, techniques and mechanisms describedherein may provide for better performance in comparison to conventionaltechniques. Because multiple volumes are used rather than a singlevolume, the IO load can be distributed throughout the cluster ratherthan relying on a single namespace. Second, some or all of the volumescan be replicated across multiple nodes. Then, if a node fails, the jobcorresponding to that node can be restarted on a different node. Third,the architecture can provide for hyperconvergence. Using schedulerextenders, jobs can be placed on the same node as the data, providinglower latency for IO access and reducing network load. Fourth, sincedata. is spread out over multiple volumes, no single volume provides abottleneck for the various jobs.

Techniques and mechanisms described herein facilitate the operation ofdistributed stateful microservices in conjunction with a distributed,containerized storage system. In a containerized application systembased on technology such as Docker or Kubernetes, each compute nodeimplements a container layer that runs in an operating system. Thecontainer layer acts as an intermediate layer to facilitate theexecution of one or more container applications. The container system isstandardized so that a container application may be instantiated on anyof various operating systems and on any of various types of hardware.

In some embodiments, each compute node may include a storage driverconfigured to facilitate access between applications loaded on thecompute node and one or more storage volumes mounted on the computenode. The storage driver may be implemented as a containerizedapplication having special permissions beyond those typically accordedto containerized applications in the system, a configuration referred toherein as a privileged storage container. Techniques and mechanismsrelated to privileged storage containers are discussed in further detailwith respect to FIG. 1 and elsewhere herein.

In many configurations, potentially many instances of a containerapplication are created on potentially many different nodes. A clusteredstorage solution can be employed to provide access to data. In aclustered storage solution, a virtual storage volume can be created.Such a virtual storage volume can span potentially many differentphysical disks and can be made accessible to any of the nodes.

According to various embodiments, techniques and mechanisms describedherein employ a container-specific rather than an OS-specificaugmentation that influences a container scheduler to schedule nodesbased on the virtual storage volumes used by a particular container. Inparticular, the scheduler extension may be used to cause the schedulerto prefer nodes where data for a stateful container is located. As partof the prioritization request from the scheduler to the extension, thescheduler can pass in details about containers such as the volumes thatare being used by the container. The extender may then receive the queryand check if the container is using any consistent volumes backed by thesoftware-defined storage solution. If there is such a consistent volume,then the extender may query the storage driver to identify the nodeswhere the data is located.

In many configurations, potentially many instances of a containerapplication are created on potentially many different nodes. A clusteredstorage solution can be employed to provide access to data. In aclustered storage solution, a virtual storage volume can be created.Such a virtual storage volume can span potentially many differentphysical disks and can be made accessible to any of the nodes.

The process of making a virtual storage volume available for writing ona disk attached to a particular storage node is referred to as“mounting”. Importantly, the clustered storage solution must ensure thata virtual storage volume is mounted for writing by no more than a singlenode since simultaneous writes by different nodes tends to quicklycorrupt a storage volume.

in a clustered storage solution for containers, schedulers are in chargeof moving around volume mount points across a cluster so that containersalways have access to their consistent data. Examples of schedulers mayinclude, but are not limited to: Kubemetes, Mesos, and Swarm.

According to various embodiments, a set of nodes may be initialized toprovide an array of software services such as web applications supportedby databases and web servers. Because many of these applications rely onreading or writing data to and from storage devices, a storage drivermay be used to attach virtual volumes to nodes to provide access tostorage.

According to various embodiments, a containerized application system inwhich software services are provided by application instancesimplemented across multiple nodes provides several advantages, such asscalability and dependency management. However, such a configurationcreates substantial performance challenges. For example, if a databaseis implemented on one node while the storage volume used to manage dataaccessed by the database is mounted on another node, then performancemay degrade significantly because every database read or write mayrequire inter-node communication.

According to various embodiments, performance may be improved byemploying a converged architecture. Converged storage is a storagearchitecture that combines storage and computing resources into a singleentity. For example, by locating on the same node both a web applicationthat serves files and the virtual volume at which those tiles arestored, performance may be improved.

In some embodiments, a hyperconverged architecture extends the conceptof convergence to a virtualized architecture such as a containerizedapplication system. Hyperconverged storage is a software-definedapproach to storage management that combines storage, computation, andvirtualization in a physical unit that is managed as a single system. incontrast to a converged architecture, storage in a hyperconvergedarchitecture need not be directly attached to a physical server, butrather may be accessible as a virtualized storage solution with thephysical storage located at a different network endpoint.

Despite the performance benefits of a hyperconverged architecture,implementing such a system in a scalable and distributed fashionpresents significant challenges under conventional approaches. Forexample, the particular scheduling decisions made when supporting ahyperconverged architecture are highly dependent upon the nature of thestorage solution and applications. Thus, it is anticipated that aone-size-fits all hyperconvergence solution implemented in a schedulerwill fail to accommodate the specific needs of the various possiblearrangements of distributed storage and applications and thus fail toprovide substantial performance benefits.

Alternately, a conventional system may include a customized schedulerthat is designed to support hyperconvergence in a specific context thatincludes a particular configuration of distributed storage andapplication container instances. However, when the standard scheduler onwhich the customized scheduler is based is updated, then the customizedcomponent of the customized scheduler must also be updated to accountfor these changes. Thus, a customized scheduler may require constantmaintenance and/or may quickly fall out of date.

According to various embodiments, techniques and mechanisms describedherein allow a distributed and containerized application system toachieve hyperconvergence without altering the standard schedulerprovided in such a system. In this way, the benefits of hyperconvergencemay be obtained while at the same time retaining the benefits of asecure and updated scheduler. Further, hyperconvergence may be obtainedwithout requiring expensive, complex, and costly alterations to astandardized scheduler.

According to various embodiments, techniques and mechanisms describedherein provide for substantially improved performance of the computeritself under some configurations. For example, by achieving ahyperconverged architecture, inter-node network traffic may besubstantially reduced. Further, application response time may be reducedby reducing the time required for storage-related operations. Also, thehyperconverged architecture provides for improved scalability sinceadditional nodes may be added to the system without substantiallyincreasing the node-to-node network traffic.

In some embodiments, techniques and mechanisms described herein providefor a scheduler extender that includes one or more modules that eachextend the functionality of the scheduler. The scheduler extender mayserve as an API that provides a point at which different modules mayattach. Under such an architecture, the prioritization module and thescheduler may be independently architected and updated.

In particular embodiments, many native container schedulers allowextensions to be implemented which can be used to provide additionalintelligence to the scheduler. These extensions can be used to instructthe scheduler to exclude nodes under maintenance or to prefer nodes withmore resources available.

According to various embodiments, techniques and mechanisms describedherein provide for substantially improved performance of applicationsthat include multiple containers implemented on different nodes and thatemploy software-defined storage. These performance gains may be providedwithout modifying the native scheduler application. The extender mayalso reschedule or refuse to schedule application container instances onnodes that are in a failed or errored state.

According to various embodiments, the prioritization module may supporta “best effort” approach when scheduling application containerinstances. For example, the prioritization module may provideprioritization information for a required application containerinstances that indicates which nodes would provide the greatestperformance benefits for locating the application container instance.

In particular embodiments, the performance of any or most statefulcontainerized application may benefit from hyperconvergenceprioritization. Applications that may particularly benefit includeapplications with frequent syncs and small reads and/or writes, such asdatabase applications.

Techniques and mechanisms described herein may facilitate the operationof a scalable storage container node system. In some embodiments, ascalable storage container node system may allow application containersin a virtualized application system to quickly and directly provisionand scale storage. Further, the system may be configured to provide oneor more user experience guarantees across classes of applications.According to various embodiments, the system may pool the capacity ofdifferent services into virtual storage volumes and auto-allocatestorage as application storage traffic scales or bursts. For instance, asingle virtual storage volume may include hundreds or thousands ofterabytes of storage space aggregated across many different storagedevices located on many different physical machines.

In some embodiments, storage containers may communicate directly withserver resources such as hardware storage devices, thus reducing oreliminating unnecessary virtualization overhead. Storage containers maybe configured for implementation in a variety of environments, includingboth local computing environments and cloud computing environments, Insome implementations, storage volumes created according to thetechniques and mechanisms described herein may be highlyfailure-tolerant. For example, a virtual storage volume may include datastored on potentially many different storage nodes. A storage node mayfail for any of various reasons, such as hardware failure, networkfailure, software failure, or server maintenance. Data integrity may bemaintained even if one or more nodes that make up a storage volume failduring data storage operations.

According to various embodiments, a storage system with componentslocated across different computing devices is referred to herein as a“distributed storage system.” Alternately, or additionally, such astorage system may be referred to herein as a “clustered storagesystem.”

FIG. 1 illustrates an example of a storage container node 102. Accordingto various embodiments, a storage container node may be a serverconfigured to include a container engine and a privileged storagecontainer. The storage container node 102 shown in FIG. 1 includes aserver layer 104, an operating system layer 106, a container engine 108,a web server container 110, an email server container 112, a webapplication container 114, and a privileged storage container 116.

In some embodiments, the storage container node 102 may serve as aninterface between storage resources available at a server instance andone or more virtual storage volumes that span more than one physicaland/or virtual server. For example, the storage container node 102 maybe implemented on a server that has access to a storage device. At thesame time, a different storage container node may be implemented on adifferent server that has access to a different storage device. The twostorage nodes may communicate to aggregate the physical capacity of thedifferent storage devices into a single virtual storage volume. Thesingle virtual storage volume may then be accessed and addressed as aunit by applications running on the two storage nodes or at on anothersystem.

In some embodiments, the storage container node 102 may serve as aninterface between storage resources available at a server instance andone or more virtual storage volumes that are replicated across more thanone physical and/or virtual server. For example, the storage containernode 102 may be implemented on a server that has access to a storagevolume implemented on one or more storage devices. At the same time, adifferent storage container node may be implemented on a differentserver that has access to the same storage volume. The two storage nodesmay then each access data stored on the same storage volume. Additionaldetails regarding the configuration of multiple storage container nodesin the same system are discussed with respect to FIG. 2.

At 104, the server layer is shown. According to various embodiments, theserver layer may function as an interface by which the operating system106 interacts with the server on which the storage container node 102 isimplemented. A storage container node may be implemented on a virtual orphysical server. For example, the storage container node 102 may beimplemented at least in part on the server shown in FIG. 5. The servermay include hardware such as networking components, memory, physicalstorage devices, and other such infrastructure. The operating systemlayer 106 may communicate with these devices through a standardizedinterface provided by the server layer 104.

At 106, the operating system layer is shown. According to variousembodiments, different computing environments may employ differentoperating system layers. For instance, a physical or virtual serverenvironment may include an operating system based on Microsoft Windows,Linux, or Apple's OS X. The operating system layer 106 may provide,among other functionality, a standardized interface for communicatingwith the server layer 104.

At 108, a container engine layer is shown. According to variousembodiments, the container layer may provide a common set of interfacesfor implementing container applications. For example, the containerlayer may provide application programming interfaces (APIs) for tasksrelated to storage, networking, resource management, or other suchcomputing tasks. The container layer may abstract these computing tasksfrom the operating system. A container engine may also be referred to asa hypervisor, a virtualization layer, or anoperating-system-virtualization layer.

In some implementations, the separation of the computing environmentinto a server layer 104, an operating system layer 106, and a containerengine layer 108 may facilitate greater interoperability betweensoftware applications and greater flexibility in configuring computingenvironments. For example, the same software container may be used indifferent computing environments, such as computing environmentsconfigured with different operating systems on different physical orvirtual servers.

At storage container node may include one or more software containers.For example, the storage container node 102 includes the web servercontainer 120, the email server container 112, and the web applicationcontainer 114. A software container may include customized computer codeconfigured to perform any of various tasks. For instance, the web servercontainer 120 may provide files such as webpages to client machines uponrequest. The email server 112 may handle the receipt and transmission ofemails as well as requests by client devices to access those emails. Theweb application container 114 may be configured to execute any type ofweb application, such as an instant messaging service, an onlineauction, a wiki, or a webmail service. Although that storage containernode 102 shown in FIG. 1 includes three software containers, otherstorage container nodes may include various numbers and types ofsoftware containers.

At 116, a privileged storage container is shown. According to variousembodiments, the privileged storage container may be configured tofacilitate communications with other storage container nodes to provideone or more virtual storage volumes. A virtual storage volume may serveas a resource for storing or retrieving data. The virtual storage volumemay be accessed by any of the software containers 110, 112, and 114 orother software containers located in different computing environments.For example, a software container may transmit a storage request to thecontainer engine 108 via a standardized interface. The container engine108 may transmit the storage request to the privileged storage container116. The privileged storage container 116 may then communicate withprivileged storage containers located on other storage container nodesand/or may communicate with hardware resources located at the storagecontainer node 102 to execute the request.

In some implementations, one or more software containers may be affordedlimited permissions in the computing environment in which they arelocated. For example, in order to facilitate a containerized softwareenvironment, the software containers 110, 112, and 114 may be restrictedto communicating directly only with the container engine 108 via astandardized interface. The container engine 108 may then be responsiblefor relaying communications as necessary to other software containersand/or the operating system layer 106.

In some implementations, the privileged storage container 116 may beafforded additional privileges beyond those afforded to ordinarysoftware containers. For example, the privileged storage container 116may be allowed to communicate directly with the operating system layer106, the server layer 104, and/or one or more physical hardwarecomponents such as physical storage devices. Providing the storagecontainer 116 with expanded privileges may facilitate efficient storageoperations such as storing, retrieving, and indexing data.

FIG. 2 illustrates an example of an arrangement of components in acontainerized storage system 200, configured in accordance with one ormore embodiments. The storage system 200 includes a master node 202 incommunication with a plurality of application nodes 210, 212, and 214.Each node has implemented thereon a storage driver 216. In addition, themaster node includes a scheduler 204 that has access to an extender 206that includes a prioritization module 208. Each node can mount one ormore of a plurality of virtual volumes 230, 232, 234, and 236. Eachvirtual volume can include storage space on one or more of a pluralityof storage disks 242, 244, 246, and 248 in a storage pool 240.

According to various embodiments, the clustered storage system 200 shownin FIG. 2 may be implemented in any of various physical computingcontexts. For example, some or all of the components shown in FIG. 2 maybe implemented in a cloud computing environment such as Amazon WebServices (AWS), Microsoft Azure, or Google Cloud. As another example,some or all of the components shown in FIG. 2 may be implemented in alocal computing environment such as on nodes in communication via alocal area network (LAN) or other privately managed network.

In some implementations, a node is an instance of a container systemimplemented on a computing device such as the computing device shown inFIG. 5. In some configurations, multiple nodes may be implemented on thesame physical computing device. Alternately, a computing device maycontain a single node. An example configuration of a container node isdiscussed in further detail with respect to FIG. 1.

According to various embodiments, each node may be configured toinstantiate and execute one or more containerized application instance.Each node may include many components not shown in FIG. 2. Thesecomponents may include hardware and/or software components, such asthose discussed with respect to FIG. 1 and FIG. 5.

According to various embodiments, each node may include a storage driver216. The storage driver 216 may perform any of various types ofstorage-related operations for the node. For example, the storage driver216 may facilitate the mounting or unmounting of virtual storagevolumes. As another example, the storage driver 216 may facilitate datastorage or retrieval requests associated with a mounted virtual storagevolume. In some embodiments, the storage driver 216 may be substantiallysimilar or identical to the privileged storage container 116 shown inFIG. 1.

According to various embodiments, each node may include a scheduleragent 260. The scheduler agent 260 may facilitate communications betweennodes. For example, the scheduler 204 in the master node may communicatewith the scheduler agent 260. The scheduler agent 260 may thencommunicate with the storage driver 260 to perform an operation such asinitiating an application container instance or unmounting a virtualvolume.

In some implementations, the disks 242, 244, 244, and 246 may beaccessible to the container nodes via a network. For example, the disksmay be located in storage arrays containing potentially many differentdisks. In such a configuration, which is common in cloud storageenvironments, each disk may be accessible for potentially many nodes toaccess. A storage pool such as the pool 240 may include potentially manydifferent disks.

According to various embodiments, the virtual storage volumes 242, 244,244, and 246 are logical storage units created by the distributedstorage system. Each virtual storage volume may be implemented on asingle disk or may span potentially many different physical disks. Atthe same time, data from potentially many different virtual volumes maybe stored on a single disk. In this way, a virtual storage volume may becreated that is potentially much larger than any available physicaldisk. At the same time, a virtual storage volume may be created in sucha way as to be robust to the failure of any individual physical disk.Further, the virtual storage volume may be created in such a way as toallow rapid and simultaneous read access by different nodes. Thus, asingle virtual storage volume may support the operation of containerizedapplications implemented in a distributed fashion across potentiallymany different nodes.

According to various embodiments, a virtual volume can be replicatedacross multiple nodes, for instance to support read-only access bydifferent nodes. For example, in FIG. 2, the virtual volume A 230 isreplicated across Node A 210 and Node B 212.

According to various embodiments, a virtual volume can be aggregatedacross multiple nodes. Such a configuration may support distributed andparallel reads and writes to and from the volume. For example, thevirtual volume B1 232 and the virtual volume B2 234 shown in FIG. 2 aredifferent data portions of the same virtual volume B.

According to various embodiments, each node may be configured toimplement one or more instances of one or more containerized storageapplications. For example, the node A 210 includes application instancescorresponding with the metadata server 218 and the App 1 222, while thenode B 212 includes application instances corresponding with the JobServer 220 and the App 1 224. In some configurations, more than oneinstance of an application container may be implemented at once. Forexample, the Node N 214 includes an instance of the applicationcontainer App 2 228 as well as App 1 226.

In some implementations, the metadata server 218 may be configured toperform any or all of a variety of tasks related to the ingestion,storage, tracking, and management of files across the virtual storagevolumes. For example, the metadata server 218 may be configured toreceive a list of files and then distribute the files across the virtualstorage volumes. Techniques related to the ingestion and distribution offiles are discussed in further detail with respect to FIG. 9. As anotherexample, the metadata server 218 may be configured to receive anidentifier associated with a file and to respond with a messageindicating one or more virtual storage volumes on which the file isstored. As yet another example, the metadata server 218 may beconfigured to receive an identifier associated with a file and to thendistribute the file across some or all of the virtual storage volumes.

In some embodiments, the job server 220 may be implemented on the samenode as the metadata server or may be implemented on a different node,as shown in FIG. 2. The job server 220 may be configured to perform anyor all of a variety of tasks related to the instantiation, execution,and termination of compute jobs. For example, the job server 220 may beconfigured to receive a job specification and to then instantiate anumber of compute jobs based on the job specification. Techniquesrelated to the instantiation of compute jobs are discussed in additionaldetail with respect to FIG. 10. As another example, the job server 220may be configured to monitor jobs for completion and then instruct themetadata server to distribute result files generated by the job.Techniques related to the monitoring of jobs for completion arediscussed in additional detail with respect to FIG. 11.

According to various embodiments, each of the job instances may beimplemented as an instance of a containerized application. For example,a job instance may be an instance of a containerized application relatedto a machine learning operation. Different job instances may be providedwith different configuration information, such as different lists offiles to analyze or process.

In particular embodiments, an application container may correspond toany of a wide variety of containerized applications. For example, asdiscussed with respect to FIG. 1, a containerized application may be aweb server 110, an email server 112, a web application 114, a database,or any of many other types of applications.

In some embodiments, the master node 202 is configured to manage theoperations of the clustered storage system. For example, the scheduler204 at the master node 202 may be configured to receive a request tomount a virtual volume for use at a particular node. The scheduler 204may then communicate with that node to provide instructions to mount thevirtual volume.

According to various embodiments, the scheduler 204 may be implementedas standardized component of the containerized application system. Theextender 206 may serve as a system to extend the functioning of thescheduler. For instance, the extender 206 may implement one or moremodules that provide additional logic governing operations such as thescheduling of application container instances on distributed nodes.

In some embodiments, the scheduler 204 at the master node 202 may beconfigured to receive a request to load an application containerinstance onto a node. The scheduler 204 may then communication with theprioritization module 208 to select a suitable node and then communicatewith that node to provide instructions to load the application containerinstance. Techniques regarding application instance prioritization arediscussed in additional detail with respect to FIGS. 3 and 4.

FIG. 3 illustrates an example of a method 300 for controlling theloading of a container on a containerized application node, performed inaccordance with one or more embodiments. The method 300 may beimplemented on a master node in a distributed computing system. Forinstance, the method 300 may be performed on the scheduler 204 shown inFIG. 2.

At 302, a request is received to instantiate a containerized applicationon an application node. According to various embodiments, the requestmay be generated in any of various ways. For example, the request may bemanually generated by a systems administrator or may be generatedautomatically, such as in the course of executing a configurationscript.

At 304, a prioritization request is transmitted to a prioritizationmodule. According to various embodiments, the prioritization module maybe implemented as an extension within a scheduler extender. Thescheduler extender may serve to extend the functioning of the scheduler,which may be implemented as a native component of the containerizedapplication system. For example, the scheduler may be implemented as anative component of a system such as Docker or Kubemetes.

At 306, prioritization information is determined for the containerizedapplication. According to various embodiments, the prioritizationinformation may be determined based on whether a virtual storage volumeemployed by a the containerized to store or retrieve data is mounted atan application node. Additional details regarding the determination ofprioritization information are discussed with respect to the method 400shown in FIG. 4.

At 308, an application node is selected based on the prioritizationinformation. According to various embodiments, the application node thathas the highest prioritization may be selected. If two application nodeshave equally high prioritization, then one may be selected at random.

At 310, a containerized application instantiation message to theselected application node. According to various embodiments, thecontainerized application instantiation message may identify theapplication container for the container engine on the application nodeto instantiate

In particular embodiments, the containerized application instantiationmessage may be transmitted as part of native communications between thescheduler and the application node. For example, the native schedulerapplication may transmit the containerized application instantiationmessage to a native scheduler agent at the application node via a nativeapplication procedure interface that defines the communications betweenthese components.

At 312, a determination is made as to whether the containerizedapplication was successfully instantiated. In some embodiments, when ascheduler agent at an application node receives a containerizedapplication instantiation message, it may instruct the container engineat the application node to instantiate the application container. Then,when the containerized application is successfully instantiated, thescheduler may send a response message to the scheduler at the masternode to confirm the instantiation.

At 314, if the instantiation is successful, then application instanceinformation is recorded. According to various embodiments, theapplication instance information may be stored in a place accessible tothe master node. The application instance information may indicate whichapplication containers are instantiated on which application nodes.

FIG. 4 illustrates an example of a method 400 for prioritizing containerloading, performed in accordance with one or more embodiments. Themethod 400 may be performed at one or more components implemented on amaster node in a distributed computing system. For instance, the method400 may be performed at the prioritization module 208 shown in FIG. 2.

At 402, a prioritization request is received. According to variousembodiments, the prioritization request may identify one or moreapplication containers identified for scheduling by the scheduler. Insome instances, the request may identify a number of instances of anapplication container to be scheduled. As part of the prioritizationrequest from the scheduler to the extension, the scheduler can pass indetails about containers such as the volumes that are being used by thecontainer.

At 404, an application container associated with the prioritizationrequest is identified. An application container may be identified by,for instance, a unique identification number associated with theapplication container. Such an identifier may be included with therequest received at operation 402.

At 406, one or more virtual storage volumes associated with theapplication container is identified. According to various embodiments, avirtual storage volume may be identified as being associated with theapplication container instances based on configuration information. Forexample, a database may include one or more entries for each applicationcontainer available for instantiation on the system. The database mayalso indicate which virtual storage volume or volumes are associatedwith the application container.

In some embodiments, a virtual storage volume that is associated with anapplication container is one which an instance of the applicationcontainer may store data on or retrieve data from. For example, if theapplication container includes a database application, then a virtualstorage volume used to store data records included in the database maybe identified as associated with the application container. As anotherexample, if the application container includes a webserver, then avirtual storage volume used to store files served by the webserver maybe identified as associated with the application container.

In particular embodiments, information about which storage volumes aremounted on each application node may be maintained at the master node.For example, the scheduler at the master node may maintain a databasethat includes such information. As another example, a storage driverimplemented at the master node may be configured to provide suchinformation upon request.

At 408, an application node is selected for prioritization. According tovarious embodiments, the application nodes may be prioritizedsequentially, in parallel, or in any suitable order.

At 410, node performance information is determined for the selectedapplication node. In some embodiments, the node performance informationmay include any information characterizing a current state of softwareand/or hardware associated with the selected application node. Forexample, the node performance information may indicate whether theselected application node is in a failed, errored, or non-respondingstate. As another example, the node performance information may indicatea portion or amount of used or unused computing resources such as memoryor processor time at the selected application node. In particularembodiments, node performance information may not be used by thescheduler extension, and may instead be used by the native scheduler,for instance to break ties in node priority as provided by the schedulerextension.

At 412, a determination is made as to whether any of the storage volumesassociated with the application container are mounted on the selectedapplication node. in some implementations, information about whichstorage volumes are mounted on each application node may be maintainedat the master node. For example, the scheduler at the master node maymaintain a database that includes such information. As another example,a storage driver implemented at the master node may be configured toprovide such information upon request.

At 414. application container prioritization information is deterdetermined for the selected application node. According to variousembodiments, any of various prioritization schemes may be used. Forexample, each node may be assigned a score between 0 and 1, between 0and infinity, or along any suitable range. Regardless of the particularscheme, an application node may be assigned a higher priority for anapplication instance when that application node has mounted thereon oneor more virtual volumes used by the application container correspondingwith the instance.

At 416, a determination is made as to whether to select an additionalapplication node for prioritization. According to various embodiments,successive application nodes may be selected for prioritization untilall, or a suitable proportion, of the identified application nodes areprioritized. At 418, the application container prioritizationinformation is provided to the server if no additional applicationcontainer nodes are selected for prioritization.

FIG. 5 illustrates one example of a server. According to particularembodiments, a system 500 suitable for implementing particularembodiments of the present invention includes a processor 501, a memory503, an interface 511, and a bus 515 (e.g., a PCI bus or otherinterconnection fabric) and operates as a container node. When actingunder the control of appropriate software or firmware, the processor 501is responsible for implementing applications such as an operating systemkernel, a containerized storage driver, and one or more applications.Various specially configured devices can also be used in place of aprocessor 501 or in addition to processor 501. The interface 511 istypically configured to send and receive data. packets or data segmentsover a network.

Particular examples of interfaces supported include Ethernet interfaces,frame relay interfaces, cable interfaces, DSL interfaces, token ringinterfaces, and the like. In addition, various very high-speedinterfaces may be provided such as fast Ethernet interfaces, GigabitEthernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces,FDDI interfaces and the like. Generally, these interfaces may includeports appropriate for communication with the appropriate media. In somecases, they may also include an independent processor and, in someinstances, volatile RAM. The independent processors may controlcommunications-intensive tasks such as packet switching, media controland management.

According to various embodiments, the system 500 is a server configuredto run a container engine. For example, the system 500 may be configuredas a storage container node as shown in FIGS. 1 and 2. The server mayinclude one or more hardware elements as shown in FIG. 5. In someimplementations, one or more of the server components may bevirtualized. For example, a physical server may be configured in alocalized or cloud environment. The physical server may implement one ormore virtual server environments in which the container engine isexecuted. Although a particular server is described, it should berecognized that a variety of alternative configurations are possible.For example, the modules may be implemented on another device connectedto the server.

FIG. 6 shows an example configuration of nodes 602, 604, 606, 608, and610. Virtual storage volume V1 612 is mounted at node 602 and node 612.Virtual storage volume V2 614 is mounted at node 604 and node 606.Suppose for the purpose of illustration that an application isprioritized for instantiation in the configuration of nodes shown inFIG. 6. Suppose further that the application prioritized forinstantiation involves access to both volume V1 612 and volume V2 614.In this situation, Node 5 610 may be assigned a prioritization of 0because it is not responding to requests and therefore may be in anerrored or failed state. Node 4 608 may be assigned a prioritization of10 because it is available for instantiation but has neither V1 612 norV2 614 mounted thereon. Node 1 602 and Node 3 606 may each be assigned aprioritization of 100 because they each have one of the virtual storagevolumes mounted thereon. Node 2 604 may be assigned a prioritization of200 because it has both of the virtual storage volumes mounted thereon.

FIG. 7 shows an alternate configuration of nodes and components,provided in accordance with one or more embodiments. FIG. 7 includes amaster node 702 in communication with node 1 704, node 3 2 706, and node3 708. As with other configurations shown herein, a system many includevarious numbers and configurations of nodes.

In the example shown in FIG. 7, the master node 702 includes a framework714, which is also present on each of node 1 704, node 3 2 706, and node3 708. According to various embodiments, the framework 704 maycorrespond to an instantiation of Apache Mesos. As such, the frameworkmay be used to schedule tasks and/or manage resources among the variousnodes. The framework at the master node 702 may be configured toidentify and track resources on the dependent nodes, which include node1 704, node 3 2 706, and node 3 708.

Each dependent node may have one or more containerized applications,including applications 718, applications 722. and applications 726. Oneor more dependent nodes may also include an instance of the storagedriver 216. In addition, one or more dependent nodes may also havemounted thereon one or more virtual volumes, such as the virtual volume724.

According to various embodiments, computing resources such as CPU time,communication ports, and memory space may be available on one or more ofthe nodes shown in FIG. 7. When a framework at a focal node receives arequest to schedule or a task, it may receive offers of resources onother nodes in the system from the framework at the master node. Eachresource offer may designate one or more of the nodes and indicate theresources available on the designated nodes. The focal node may thenaccept or reject the resource offer. When a resource offer is accepted,the focal node may schedule the task for execution on the designatednode associated with resource offer.

FIG. 8 illustrates an alternate method 800 for container loading,configured in accordance with one or more embodiments. The method 800may be implemented at a focal node, such as any of the nodes shown inFIG. 7. At 802, a request to schedule a task is received at a frameworkon a focal node. The request may be received from an application, from asystems administrator, from a configuration script, or from any othersource. The request may indicate a particular task, such as theinstantiation of a containerized application. The request may alsoindicate one or more virtual volumes associated with the execution ofthe task.

At 804, a resource offer designating a node is received from theframework at the master node. According to various embodiments, theframework at the master node may track resources available on each ofthe dependent nodes in the system. For instance, the framework at themaster node may receive a message from the focal node indicating that atask needs to be scheduled. The framework at the master node may thenreview the resources available on nodes in the cluster and respond tothe message with an offer of resources on one or more of the dependentnodes. The offer may specify information such as an amount of CPU timeor cores, one or more communication ports, and/or an amount of memorystorage available on one or more of the dependent nodes.

At 806, a determination is made as to whether the task requires accessto a virtual storage volume. According to various embodiments,information about the storage volumes access by the task may be includedwith the request received at operation 802. Alternately, oradditionally, the system may maintain a record such as a database thatindicates which virtual volumes are required by which containerizedapplications.

At 808, a determination is made as to whether the designated nodeincludes an instance of the storage driver. According to variousembodiments, the determination may be made in any of various ways. Forexample, each instance of the storage driver on a node within thecluster may maintain a record of which other nodes also include aninstance of the storage driver. As another example, the storage driverat the focal node may communicate with the designated node to determinewhether an instance of the storage driver is present.

At 810, a determination is made as to whether the required virtualstorage volume is mounted at the designated node. The determination maybe made in any of various ways. For example, the storage driver at thefocal node may communicate with the storage driver at the designatednode, the master node, or any other node to identify this information.

At 812, if the designated node does not include the virtual storagevolume, a determination is made as to whether any node on which thevirtual storage volume is mounted includes sufficient resources for thetask. According to various embodiments, the determination may be made atleast in part based on communication with the mater node. For example,the framework at the focal node may request from the master node anindication of which resources are available on which nodes in thesystem.

At 816, a determination is made as to whether the offer is able tofulfill other resources associated with the task. For example, the offermay specify resources such as CPU time, memory space, communicationports, and other such resources. The task may also be associated withone or more resource requirements associated with the execution of thetask. Accordingly, before accepting the resource offer, the system maydetermine whether the resource offer includes sufficient resources forthe execution of the task.

At 814, the offer of resources is rejected. The offer of resources maybe rejected if the task requires a virtual storage volume and thedesignated node does not include the storage driver, since in this casethe virtual storage volume may be inaccessible. The offer of resourcesmay also be rejected if another node both has the required virtualstorage volume mounted thereon and has adequate resources for executingthe task, since in this case the focal node may wait to receive an offerfor resources on that other node to achieve hyperconvergence.

At 818, the offer of resources is accepted. At 820, the task isscheduled on the designated node. According to various embodiments,accepting the offer of resources and scheduling the task may involvetransmitting one or more messages or instructions to the master nodeand/or the designated node. The messages or instructions instruction mayinclude such information as an identifier associated with the offer ofresources and information identifying the task to be executed.

In particular embodiments, an acceptance of a resource offer mayindicate a portion of the total amount of offered resources. Forexample, a resource offer may specify 2 CPU cores and 8GB of RAM. If thetask requested for scheduling requires fewer resources than thoseoffered, then the acceptance of the resource offer may specify, forinstance, 1 CPU core and 4GB of RAM. By specifying the amount ofresources accepted, these resources may then be reserved on thedesignated node for the execution of the scheduled task.

According to various embodiments, one or more of the operationsdescribed in FIG. 8 may be performed by a default scheduler native tothe container management engine. In addition, one or more of theoperations shown in FIG. 8 may be performed by a scheduler extension, asdiscussed herein. For example, one or more of operations 806, 808, 810,812, or 814 may be implemented by a scheduler extension.

FIG. 9 illustrates an example of a method 900 for loading data in adistributed storage system, performed in accordance with one or moreembodiments. In some implementations, the method 900 may be performed bya job server such as the job server 220 shown in FIG. 2.

At 902, a request is received to ingest data for distributedcomputation. According to various embodiments, the request may begenerated manually or automatically. For example, a user may initiate arequest to implement a machine learning procedure or Tensor Flowprocess. Alternately, the request may be generated dynamically, forinstance when trigged by a designated triggering condition.

In some implementations, the request may include configurationinformation suitable for initiating file ingestion. Such information mayinclude, but is not limited to: a list of files to ingest and a list ofstorage volumes for file storage.

At 904, a set of storage volumes for file storage are created oridentified. In some implementations, the system may employ an existingset of storage volumes for file storage. For instance, each compute nodemay be associated with a respective virtual storage volume for storingfiles analyzed by jobs instantiated on that compute node. Alternately,or additionally, the system may create volumes dynamically as part ofthe execution of the method 900. For example, the system may determinethe number of compute nodes and storage volumes needed based on factorssuch as the number of jobs requested and/or the amount of data ingested.

At 906, a file is identified for ingestion. According to variousembodiments, files may be ingested sequentially, in parallel, or in anysuitable order.

At 908, a consistent hashing function is applied to the file to identifya storage volume on which to store the file. According to variousembodiments, the consistent hashing function may receive as inputmetadata such as the file name or other file characteristics. Then, theconsistent hashing function may produce as an output an identifierassociated with one of the virtual storage volumes identified inoperation 904. Any suitable hashing function may be employed. Forexample, a variant of the MD5 hashing function may be applied to thefile name. As another example, the bits of the filename may be summedand a modulo operator may be applied.

At 910, the file is stored on the identified storage volume. Accordingto various embodiments, the file may be transmitted directly from theingestion point to the identified storage volume and need not bereceived by the metadata server. For example, the metadata server maytransmit an instruction to the compute node associated with the virtualstorage volume to retrieve the identified file and store it on theidentified storage volume.

At 912, a determination is made as to whether to ingest one or moreadditional files. In some implementations, the data loading method maycontinue to execute until all files are ingested. For instance, therequest received at operation 902 may identify a specific list of filesand/or directories containing files to ingest.

According to various embodiments, the operations performed as part ofthe method 900 as well as other methods discussed herein may beperformed in serial or in parallel. For instance, multiple files may bedistributed simultaneously in some configurations.

FIG. 10 illustrates an example of a method 1000 for executing one ormore jobs, performed in accordance with one or more embodiments.According to various embodiments, the method 1000 may be performed on ajob server, such as the job server 220 shown in FIG. 2.

At 1002, a request is received to execute a job specification. Accordingto various embodiments, the job specification may include any suitableinformation for executing a number of jobs distributed across acomputing cluster. For example, the job specification may identify anumber of instances of a containerized application to instantiate. Thejob specification may also identify configuration information for eachof the job specifications. For instance, the job specification mayindicate tiles assigned for analysis or processing by particular jobs.

A job is selected for launching at 1004. In some embodiments, jobs maybe selected sequentially or according to any suitable ordering. After ajob is selected, one or more files accessed by the job are identified at1006. The files accessed by the job may be identified by, for instance,analyzing the job specification associated with the request received atoperation 1002. For example, the job specification may indicateparticular identifiers or ranges of file names associated with specificjobs.

At 1008, the job server communicates with the metadata server toidentify the volume containing the identified files. As discussed withrespect to FIG. 9, the metadata server may ingest files and thenallocate each file to one or more volume. The metadata server may thenbe configured to receive a request to locate a particular file andresponse with information suitable for identifying one or more computenodes on which the volume is located. For example, the metadata servermay comply with the request by implementing an operation similar oridentical to the operation 908 shown in FIG. 9.

At 1010, the job is launched on the compute node associated with theidentified volume. In some implementations, the job may be launched bytransmitting an instruction to a container scheduler, such as thescheduler 204 shown in FIG. 2. The instruction may indicate, forexample, the identity of the application to be instantiated, theidentity of the compute note on which the application is to beinstantiated, and/or configuration information for the instantiation ofthe job. For example, the configuration information may indicate one ormore files assigned to the job for analysis or processing.

In some instances, a job may require files from multiple volumes whichcould be spread across multiple nodes. In such instances, the job may bestarted on the node having the best data locality. For example, the nodehaving the largest portion of the required files may be selected.

At 1012, a determination is made as to whether to launch one or moreadditional jobs. In some implementations, the job execution method maycontinue to execute until all jobs are executed. For instance, the jobspecification may indicate a specific number of jobs to execute.

According to various embodiments, the operations performed as part ofthe method 1000 as well as other methods discussed herein may beperformed in serial or in parallel. For instance, multiple jobs may beexecuted simultaneously in some configurations.

FIG. 11 illustrates an example of a method 1100 for terminating one ormore jobs, performed in accordance with one or more embodiments.According to various embodiments, the method 1100 may be performed on ajob server, such as the job server 220 shown in FIG. 2.

At 1102, a request to monitor jobs for termination is received.According to various embodiments, the request may be generated upon theexecution of one or more jobs in order to properly respond to thetermination of each job. For example, the method 1100 may be initiatedwhen one or more jobs are executed in accordance with the method 1000shown in FIG. 10.

At 1104, an indication is received that a job has been terminated. Insome embodiments, a job may be configured to communicate with the jobserver when the job has completed. For example, the job may involve acomputational task related to learning from a designated set of files ina machine learning operation. When this learning has been completed anda result set has been generated, the job may communicate with the jobserver to inform the job server that the task has been completed.

At 1106, the job server instructs the metadata server to distribute jobresults across the cluster. As discussed herein, in some embodiments themetadata server may be configured to copy a file from one volume toother volumes within the cluster. For example, a file generated as aresult set in a learning phase of a machine learning computation may becopied across the cluster for use in a subsequent inferential phase inthe machine learning computation.

At 1108, a determination is made as to whether to wait for one or moreadditional job terminations. In some implementations, the jobtermination method may continue to execute until all jobs areterminated. For instance, jobs may terminate at different times due todifferences in when each job was started, what data each job was taskedwith analyzing, and other such considerations.

In the foregoing specification, the invention has been described withreference to specific embodiments. However, one of ordinary skill in theart appreciates that various modifications and changes can be madewithout departing from the scope of the invention as set forth in theclaims below. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope ofinvention.

1. A computing system comprising: a plurality of computing nodes thateach include a respective processor, a respective memory module, and arespective communications interface, each computing node beingconfigured to execute a compute job upon request, each computing nodeincluding a storage interface configured to communicate with arespective virtual storage volume; a metadata server configured todistribute a plurality of files among the respective virtual storagevolumes and to identify upon request the virtual storage volumeassociated with a designated one of the tiles; and a job serverconfigured to initiate a respective one or more compute jobs on each ofthe plurality of computing nodes, each compute job accessing one or morefiles stored on the respective virtual storage volume associated withthe respective computing node on which the respective compute job isinitiated.
 2. The computing system recited in claim 1, wherein each ofthe computing nodes includes a respective container engine applicationexecuted by an operating system, the container engine applicationproviding a standardized platform for the instantiation and execution ofcontainerized applications.
 3. The computing system recited in claim 2,wherein the one or more containerized applications includes a storagedriver configured to manage the respective virtual storage volume. 4.The computing system recited in claim 2, wherein the metadata server isimplemented as a designated containerized application on a designatedone of the plurality of computing nodes.
 5. The computing system recitedin claim 2, wherein the job server is implemented as a designatedcontainerized application on a designated one of the plurality ofcomputing nodes.
 6. The computing system recited in claim 2, whereineach of the compute jobs is implemented as one of the containerizedapplications.
 7. The computing system recited in claim 1, wherein thejob server is further configured to monitor each of the compute jobs forcompletion.
 8. The computing system recited in claim 7, wherein the jobserver is further configured to instruct the metadata server toreplicate a designated result set associated with a designated one ofthe jobs across the virtual storage volumes when it is determined thatthe designated compute job has completed.
 9. The computing systemrecited in claim 1, wherein the identification of the virtual storagevolume associated with a designated file includes applying a consistenthashing function to data or metadata associated with the designatedfile.
 10. The computing system recited in claim 1, wherein each of thejobs includes one or more tasks associated with training a machinelearning model using data stored in the files stored in the virtualstorage volume associated with the compute node on which the job isinstantiated.
 11. A method comprising: distributing via a metadataserve, a plurality of files among a plurality of virtual storagevolumes, each of the virtual storage volumes in communication with arespective computing node via a respective storage interface, each ofthe computing nodes including a respective processor, a respectivememory module, and a respective communications interface, wherein themetadata server is configured to identify upon request the virtualstorage volume associated with a designated one of the files; andinitiating via a job server a respective one or more compute jobs oneach of the plurality of computing nodes, each compute job accessing oneor more files stored on the respective virtual storage volume associatedwith the respective computing node on which the respective compute jobis initiated.
 12. The method recited in claim 11, wherein each of thecomputing nodes includes a respective container engine applicationexecuted by an operating system, the container engine applicationproviding a standardized platform for the instantiation and execution ofcontainerized applications.
 13. The method recited in claim 12, whereinthe one or more containerized applications includes a storage driverconfigured to manage the respective virtual storage volume.
 14. Themethod recited in claim 12, wherein each of the compute jobs isassociated with wherein each of the compute jobs is implemented as oneof the containerized applications.
 15. The method recited in claim 12,wherein the job server is further configured to monitor each of thecompute jobs for completion.
 16. The method recited in claim 16, whereinthe job server is further configured to instruct the metadata server toreplicate a designated result set associated with a designated one ofthe jobs across the virtual storage volumes when it is determined thatthe designated compute job has completed.
 17. The method recited inclaim 11, wherein the identification of the virtual storage volumeassociated with a designated file includes applying a consistent hashingfunction to data or metadata associated with the designated file. 18.One or more non-transitory machine-readable media having instructionsstored thereon for performing a method, the method comprising:distributing via a metadata serve, a plurality of files among aplurality of virtual storage volumes, each of the virtual storagevolumes in communication with a respective computing node via arespective storage interface, each of the computing nodes including arespective processor, a respective memory module, and a respectivecommunications interface, wherein the metadata server is configured toidentify upon request the virtual storage volume associated with adesignated one of the files; and initiating via a job server arespective one or more compute jobs on each of the plurality ofcomputing nodes, each compute job accessing one or more files stored onthe respective virtual storage volume associated with the respectivecomputing node on which the respective compute job is initiated.
 19. Theone or more non-transitory machine-readable media recited in claim 18,wherein each of the computing nodes includes a respective containerengine application executed by an operating system, the container engineapplication providing a standardized platform for the instantiation andexecution of containerized applications.
 20. The one or morenon-transitory machine-readable media recited in claim 18, wherein thejob server is further configured to monitor each of the compute jobs forcompletion, and wherein the job server is further configured to instructthe metadata server to replicate a designated result set associated witha designated one of the jobs across the virtual storage volumes when itis determined that the designated compute job has completed.